This is one page of the R Handbook for Epidemiologists, but is being printed as a stand-alone page.
You can find the complete handbook on Github
Heatmaps can be useful when tracking reporting metrics across many facilities/jurisdictions over time
For example, the image below shows % of weekdays that data was received from each facility, week-by-week:
Often in public health, an objective is to assess trends over time for many entities (facilities, jurisdictions, etc.). One way to visualize trends over time from many entities is a heatmap where the x-axis is time and the y-axis are the many entities.
To demonstrate this, we import this dataset of daily malaria case reports from 65 facilities.
The preparation will involve:
Below are the first 30 rows of these data:
The packages we will use are:
The objective is to transform the daily reports (seen in previous tab) into weekly reports with a summary of performance - in this case the proportion of days per week that the facility reported any data for Spring District from April-May 2019.
To achieve this:
date2week() from package aweek
floor_day = argument means that dates are rounded into the week only (day of the week is not shown)factor = argument converts the new column to a factor - important because all possible weeks within the date range are designated as levels, even if there is no data for them currently.summarize() creates new columns to calculate reporting performance for each “facility-week”:
right_join()) to a comprehensive list of all possible facility-week combinations, to make the dataset complete. The matrix of all possible combinations is created by applying expand() to those two columns of the dataframe as it is at that moment in the pipe chain (represented by “.”). Because a right_join() is used, all rows in the expand() dataframe are kept, and added to agg_weeks if necessary. These new rows appear with NA (missing) summarized values.# Create weekly summary dataset
agg_weeks <- facility_count_data %>%
# filter the data as appropriate
filter(District == "Spring",
data_date < as.Date("2019-06-01")) %>%
# Create week column from data_date
mutate(week = aweek::date2week(data_date,
start_date = "Monday",
floor_day = TRUE,
factor = TRUE)) %>%
# Group into facility-weeks
group_by(location_name, week, .drop = F) %>%
# Create summary column on the grouped data
summarize(n_days = 7, # 7 days per week
n_reports = dplyr::n(), # number of reports received per week (could be >7)
malaria_tot = sum(malaria_tot, na.rm = T), # total malaria cases reported
n_days_reported = length(unique(data_date)), # number of unique days reporting per week
p_days_reported = round(100*(n_days_reported / n_days))) %>% # percent of days reporting
# Ensure every possible facility-week combination appears in the data
right_join(expand(., week, location_name)) # "." represents the dataset at that moment in the pipe chain
The ggplot() is make using geom_tile():
scale_x_date()fill is the performance for that facility-week (numeric)scale_fill_gradient() is used on the numeric fill, specifying colors for high, low, and NAscale_x_date() is used on the x-axis specifying labels every 2 weeks and their formatggplot(agg_weeks,
aes(x = aweek::week2date(week), # transformed to date class
y = location_name,
fill = p_days_reported))+
# tiles
geom_tile(colour="white")+ # white gridlines
scale_fill_gradient(low = "orange", high = "darkgreen", na.value = "grey80")+
scale_x_date(expand = c(0,0),
date_breaks = "2 weeks",
date_labels = "%d\n%b")+
# aesthetic themes
theme_minimal()+ # simplify background
theme(
legend.title = element_text(size=12, face="bold"),
legend.text = element_text(size=10, face="bold"),
legend.key.height = grid::unit(1,"cm"), # height of legend key
legend.key.width = grid::unit(0.6,"cm"), # width of legend key
axis.text.x = element_text(size=12),
axis.text.y = element_text(vjust=0.2),
axis.ticks = element_line(size=0.4),
axis.title = element_text(size=12, face="bold"),
plot.title = element_text(hjust=0,size=14,face="bold"),
plot.caption = element_text(hjust = 0, face = "italic")
)+
# plot labels
labs(x = "Week",
y = "Facility name",
fill = "Reporting\nperformance (%)", # legend title
title = "Percent of days per week that facility reported data",
subtitle = "District health facilities, April-May 2019",
caption = "7-day weeks beginning on Mondays.")If you want to order the y-axis facilities by something, convert them to class Factor and provide the order. Below, the order is set based on the total number of reporting days filed by the facility across the whole timespan:
facility_order <- agg_weeks %>%
group_by(location_name) %>%
summarize(tot_reports = sum(n_days_reported, na.rm=T)) %>%
arrange(tot_reports) # ascending orderas.tibble(facility_order)
## # A tibble: 15 x 2
## location_name tot_reports
## <chr> <int>
## 1 Facility 56 1
## 2 Facility 65 6
## 3 Facility 11 19
## 4 Facility 39 31
## 5 Facility 59 33
## 6 Facility 27 40
## 7 Facility 32 41
## 8 Facility 51 41
## 9 Facility 7 42
## 10 Facility 1 46
## 11 Facility 9 48
## 12 Facility 35 50
## 13 Facility 50 51
## 14 Facility 58 53
## 15 Facility 28 75Now use the above vector (facility_order$location_name) to be the order of the factor levels of location_name in the dataset agg_weeks:
agg_weeks <- agg_weeks %>%
mutate(location_name = factor(location_name, levels = facility_order$location_name))And now the data are re-plotted, with location_name being an ordered factor:
ggplot(agg_weeks,
aes(x = aweek::week2date(week), # transformed to date class
y = location_name,
fill = p_days_reported))+
# tiles
geom_tile(colour="white")+ # white gridlines
scale_fill_gradient(low = "orange", high = "darkgreen", na.value = "grey80")+
scale_x_date(expand = c(0,0),
date_breaks = "2 weeks",
date_labels = "%d\n%b")+
# aesthetic themes
theme_minimal()+ # simplify background
theme(
legend.title = element_text(size=12, face="bold"),
legend.text = element_text(size=10, face="bold"),
legend.key.height = grid::unit(1,"cm"), # height of legend key
legend.key.width = grid::unit(0.6,"cm"), # width of legend key
axis.text.x = element_text(size=12),
axis.text.y = element_text(vjust=0.2),
axis.ticks = element_line(size=0.4),
axis.title = element_text(size=12, face="bold"),
plot.title = element_text(hjust=0,size=14,face="bold"),
plot.caption = element_text(hjust = 0, face = "italic")
)+
# plot labels
labs(x = "Week",
y = "Facility name",
fill = "Reporting\nperformance (%)", # legend title
title = "Percent of days per week that facility reported data",
subtitle = "District health facilities, April-May 2019",
caption = "7-day weeks beginning on Mondays.")You can add a geom_text() layer on top of the tiles, to display the actual numbers of each tile. Be aware this may not look pretty if you have many small tiles!
geom_text(aes(label=p_days_reported))+. In the aesthetic aes() of the geom_tile() the argument label (what to show) is set to the same numeric column used to create the color gradient.ggplot(agg_weeks,
aes(x = aweek::week2date(week), # transformed to date class
y = location_name,
fill = p_days_reported))+
# tiles
geom_tile(colour="white")+ # white gridlines
geom_text(aes(label = p_days_reported))+ # add text on top of tile
scale_fill_gradient(low = "orange", high = "darkgreen", na.value = "grey80")+
scale_x_date(expand = c(0,0),
date_breaks = "2 weeks",
date_labels = "%d\n%b")+
# aesthetic themes
theme_minimal()+ # simplify background
theme(
legend.title = element_text(size=12, face="bold"),
legend.text = element_text(size=10, face="bold"),
legend.key.height = grid::unit(1,"cm"), # height of legend key
legend.key.width = grid::unit(0.6,"cm"), # width of legend key
axis.text.x = element_text(size=12),
axis.text.y = element_text(vjust=0.2),
axis.ticks = element_line(size=0.4),
axis.title = element_text(size=12, face="bold"),
plot.title = element_text(hjust=0,size=14,face="bold"),
plot.caption = element_text(hjust = 0, face = "italic")
)+
# plot labels
labs(x = "Week",
y = "Facility name",
fill = "Reporting\nperformance (%)", # legend title
title = "Percent of days per week that facility reported data",
subtitle = "District health facilities, April-May 2019",
caption = "7-day weeks beginning on Mondays.")Contoured heatmap of cases over a basemap
http://data-analytics.net/cep/Schedule_files/geospatial.html
pacman::p_load(OpenStreetMap)
# Fit basemap by range of lat/long coordinates. Choose tile type
map <- openmap(c(max(linelist$lat, na.rm=T), max(linelist$lon, na.rm=T)),
c(min(linelist$lat, na.rm=T), min(linelist$lon, na.rm=T)),
zoom = NULL,
type = c("osm", "stamen-toner", "stamen-terrain","stamen-watercolor", "esri","esri-topo")[1],
mergeTiles = TRUE)
# Projection WGS84
map.latlon <- openproj(map, projection = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
# Plot map. Must be autoplotted to work with ggplot
OpenStreetMap::autoplot.OpenStreetMap(map.latlon)+
# Density tiles
stat_density2d(aes(x = lon,
y = lat,
fill = ..level..,
alpha=..level..),
bins = 10,
geom = "polygon",
data = linelist) +
scale_fill_gradient(low = "black", high = "red")+
labs(x = "Longitude",
y = "Latitude",
title = "Distribution of simulated cases")+
theme(legend.position = "none") https://www.rdocumentation.org/packages/OpenStreetMap/versions/0.3.4/topics/autoplot.OpenStreetMap
This tab should stay with the name “Resources”. Links to other online tutorials or resources.